Neutral genetic drift can aid functional protein evolution 

Jesse D Bloom 1 , Philip A Romero 1 , Zhongyi Lu 1 and Frances H Arnold* 1 

1 Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125 USA 

Email: Jesse D Bloom - jesse.bloom@gmail.com; Philip A Romero - promero@caltech.edu; Zhongyi Lu - Iu07@caltech.edu; Frances 
H Arnold*- frances@cheme.caltech.edu; 

'Corresponding author 

Abstract 

Background: Many of the mutations accumulated by naturally evolving proteins are neutral in the sense that 
they do not significantly alter a protein's ability to perform its primary biological function. However, new protein 
functions evolve when selection begins to favor other, "promiscuous" functions that are incidental to a protein's 
biological role. If mutations that are neutral with respect to a protein's primary biological function cause substantial 
changes in promiscuous functions, these mutations could enable future functional evolution. 
Results: Here we investigate this possibility experimentally by examining how cytochrome P450 enzymes that 
have evolved neutrally with respect to activity on a single substrate have changed in their abilities to catalyze 
reactions on five other substrates. We find that the enzymes have sometimes changed as much as four-fold in the 
promiscuous activities. The changes in promiscuous activities tend to increase with the number of mutations, and 
can be largely rationalized in terms of the chemical structures of the substrates. The activities on chemically similar 
substrates tend to change in a coordinated fashion, potentially providing a route for systematically predicting the 
change in one function based on the measurement of several others. 

Conclusions: Our work suggests that initially neutral genetic drift can lead to substantial changes in protein 
functions that are not currently under selection, in effect poising the proteins to more readily undergo functional 
evolution should selection "ask new questions" in the future. 



Background 

Nature employs proteins for a vast range of tasks, 
and their capacity to evolve to perform diverse func- 
tions is one of the marvels of biology. Recently, it 
has become possible to reconstruct convincing sce- 
narios for how new protein functions evolve. One 
of the most important conclusions of this work is 
that the initial steps may occur even before the 
new functions come under selection [1-6]. The rea- 
son is that in addition to their primary biological 



functions, most proteins are at least modestly effec- 
tive at performing a range of other "promiscuous" 
functions [1,2,7-10]. In laboratory experiments, 
selection can rapidly increase these promiscuous 
functions, often without much immediate cost to 
a protein's original function [2], In a particularly 
compelling set of experiments, Tawfik and coworkers 
have shown that selection for promiscuous activity 
likely explains the origin and evolution of a bacterial 
enzyme that hydrolyzes a synthetic compound only 
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recently introduced into the environment [2, 11, 12]. 
Mounting evidence therefore supports the idea that 
new protein functions evolve when selection favors 
mutations that increase an existing weak promiscu- 
ous function. 



But for as long as 50 years, since Linus Pauling 
and Emilc Zuckcrkandl published their seminal anal- 
ysis of molecular change in proteins [13], it has been 
clear that just a small fraction of the mutations that 
accumulate in naturally evolving proteins are driven 
by selection for a new function. Instead, most of the 
mutations responsible for natural sequence diver- 
gence do not change a protein's primary biological 
function, but rather are due to either neutral ge- 
netic drift [14] or pressure for a subtle recalibration 
of protein properties unrelated to the acquisition of 
an entirely new function [15]. However, even though 
most mutations accumulate under the constraint 
that they not interfere with a protein's primary 
function, they could still substantially alter other, 
promiscuous functions. Such alterations could then 
aid in the subsequent evolution of new functions. 

Here we have experimentally investigated this 
possibility using a set of enzymes that have under- 
gone genetic drift that is neutral with respect to a 
well-defined laboratory selection criterion for enzy- 
matic activity on a single substrate [16]. We have 
examined how these enzymes have changed in their 
promiscuous activities on five other substrates. As 
described below, we find that the enzymes have often 
undergone substantial changes in their promiscuous 
activities, suggesting that neutral genetic drift could 
play an important role in enabling future functional 
evolution. 



Results and Discussion 

A set of neutrally evolved cytochrome P450 en- 
zymes 

We focused our analysis on cytochrome P450 pro- 
teins. P450s are excellent examples of enzymes that 
can evolve to catalyze new reactions, since they are 
involved in a wide range of important functions such 
as drug metabolism and steroid biosynthesis [17, 18]. 



We worked with P450 BM3, a cytosolic bacterial 
enzyme that catalyzes the subterminal hydroxyla- 
tion of medium- and long-chain fatty acids [19]. 
We have previously described a set of P450 BM3 
heme domain variants that were created by lab- 
oratory neutral evolution from a common parent 
sequence [16]. Here we briefly recap the procedure 
used to create these P450s in order to explain their 
origin and why they can properly be viewed as the 
product of neutral genetic drift. 



The essential difference between neutral genetic 
drift and adaptive evolution is that in the former 
case mutations that have no substantial effect on 
fitness spread stochastically in a population, while 
in the latter case mutations spread because they are 
beneficial and so favored by selection. Of course, it 
may be difficult to discern whether a specific muta- 
tion in a natural population has spread neutrally or 
due to favorable selection. But in the laboratory it 
is possible to define an arbitrary selection criterion 
to ensure that all mutations spread due to neutral 
genetic drift. Specifically, we imposed the require- 
ment that the P450s had to hydroxylate the sub- 
strate 12-p-nitrophenoxydodecanoic acid (12-pNCA) 
with an activity exceeding a specific threshold [16]. 
All mutant P450s were therefore straightforwardly 
classified as either functional (if they exceeded the 
threshold) or nonfunctional (if they did not). While 
this selection criterion is obviously a simplification 
of natural evolution, we believe that for the cur- 
rent purpose it is a reasonable abstraction of the 
evolutionary requirement that an enzyme's primary 
activity exceed some critical level in order to allow 
its host organism to robustly survive and reproduce. 
To implement laboratory neutral evolution using 
this selection criterion, we began with a single par- 
ent P450 BM3 heme domain variant (called Rl-11) 
and used error-prone PCR to create random mutants 
of this parent [16]. Mutants that failed to yield suf- 
ficient active protein to hydroxylate at least 75% of 
the 12-pNCA of the Rl-11 parent when expressed in 
Escherichia coli were immediately eliminated, while 
all other mutants were carried over to the next gen- 
eration with equal probability. Any mutations that 
spread among the offspring sequences were there- 
fore by definition due to neutral genetic drift, since 
there was no opportunity for any functional mutant 
to be favored over any other. We emphasize that 
the fact that the mutations spread due to neutral 
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genetic drift does not mean that they have no ef- 
fect on the protein's properties. Indeed, one of the 
growing realizations about protein evolution is that 
mutations that spread by neutral genetic drift may 
still have an impact on future evolution [20,21]. One 
mechanism for this impact is that neutral genetic 
drift can change a protein's stability and so alter 
its tolerance to future mutations [22-24]. As will 
be demonstrated below, another mechanism is that 
neutral genetic drift can alter a protein's promiscu- 
ous functions. 



As described previously [16], the end result of 
the neutral evolution was 44 different P450 vari- 
ants, each of which satisfied the selection criterion 
for activity on 12-pNCA (these are the combined 
final sequences from the monomorphic and poly- 
morphic populations in [16]). For the current study, 
we analyzed the promiscuous activities of 34 of these 
neutrally evolved P450 variants. The sequence di- 
versity of these P450s is shown in the phylogenetic 
tree of Figure [TJ they have accumulated an average 
of four nonsynonymous mutations each. 



ent P450 on all six substrates, and Figure [3] shows 
the same data with standard errors. As is apparent 
from these figures, many of the neutrally evolved 
P450s have undergone changes in their activities 
that substantially exceeded the standard errors of 
the measurements. Even on 12-pNCA, some of the 
variants have undergone modest increases or very 
mild decreases in activity . The modest increases in 
12-pNCA activity were unsurprising, since the par- 
ent P450 only hydroxy lates 12-pNCA with about 
a quarter of the activity reported for a P450 engi- 
neered for maximal 12-pNCA activity [27]. Likewise, 
the mild decreases in 12-pNCA activity were due to 
the fact that during neutral evolution the P450s 
were only required to maintain this activity above a 
minimal threshold (75% of the total 12-pNCA con- 
version of the parent protein when expressed in E. 
coli [16]). The changes in the promiscuous activities 
were often much larger than those on 12-pNCA. For 
example, several of the neutrally evolved variants 
have undergone nearly four-fold increases in activity 
on one or more of 2-phcnoxycthanol, 2-amino-5- 
chlorobcnzoxazole, and 1,2-methylenedioxybenzcnc. 
Other variants have experienced equally large de- 
creases in one or more of the promiscuous activities. 



Activities of the neutrally evolved P450 enzymes 

All of the P450 variants had evolved under selection 
solely for their ability to hydroxylate 12-pNCA. We 
examined their promiscuous hydroxylation activities 
on the five other substrates shown at the top of Fig- 
ure [2] Two of these promiscuous substrates, propra- 
nolol and 2-amino-5-chlorobenzoxazole (also known 
as zoxazolamine) , are drugs that are metabolized 
by human P450s [25,26]. The other three promis- 
cuous substrates, 11-phenoxyundecanoic acid, 2- 
phenoxyethanol, and 1,2-methylenedioxybenzcnc, 
are organic compounds of increasing structural dis- 
similarity to 12-pNCA. The parent P450 possessed 
at least some hydroxylation activity on all of these 
substrates (throughout the remainder of this work, 
"activity" refers to total substrate turnovers per en- 
zyme) . 

We measured the activities of all 34 neutrally 
evolved P450s on the five promiscuous substrates as 
well as 12-pNCA. Figure [2] shows the fold change in 
activity of each of the variants relative to the par- 



Broad patterns of change in activity can be ratio- 
nalized in terms of substrate properties 

The data in Figures [2] and [3] clearly indicate that 
some of the P450s have undergone substantial 
changes in their activities. In an effort to under- 
stand the nature of these changes, we sought to 
determine whether there were any clear patterns 
in the activities. In Figure [5J the substrates have 
been hierarchically clustered so that each succes- 
sive cluster contains substrates on which the P450s 
have increasingly similar activities (the clustering is 
illustrated by the tree-like dendrogram at the top 
of the figure, with similar substrates in adjacent 
columns) . The clustering of the substrates is readily 
rationalized in terms of their chemical structures. 
For example, 2-amino-5-chlorobenzoxazole and 1,2- 
methylenedioxybenzene cluster, meaning that P450s 
with high activity on one of these substrates also 
tend to have high activity on the other. Presumably, 
they cluster because the similarity of their structures 
(both are fusions of six and five membered rings) 
means that they have similar modes of docking in the 



3 



substrate binding pocket. Likewise, 12-pNCA and 
1 1-phenoxyundecanoic acid are phenoxycarboxylic 
acids of similar chain length, and are in the same 
cluster. To a lesser extent, 2-phenoxyethanol resem- 
bles 12-pNCA and 1 1-phenoxyundecanoic acid in its 
phenolic ether structure, and it falls into a higher 
level cluster with these two substrates. Propra- 
nolol shares a fused ring structure with 2-amino-5- 
chlorobenzoxazole and 1,2-methylenedioxybenzcnc, 
and these three substrates share a common higher 
level cluster. Overall, the hierarchical clustering in- 
dicates that substrates that appear similar to the 
human eye are also "seen" this way by the P450s, 
since the P450s tend to increase or decrease their 
activities on these substrates in a coordinated fash- 
ion. 



Figure [2] also shows the P450 variants arranged 
in hierarchical clusters. A visual inspection imme- 
diately indicates that there is an overall association 
among all of the activities. Some of the P450 vari- 
ants (redder rows) tend to show improved activity 
on most substrates, while others (bluer rows) tend to 
show decreased activity on most substrates. Taken 
together with the clustering of the similar substrates, 
this overall association suggests that there are two 
main trends in the activity changes. First, the P450s 
appear to have undergone general changes in their 
catalytic abilities that are manifested by broad in- 
creases or decreases in activity on all substrates. 
Second, the P450s appear to have experienced shifts 
in specificity to favor either the fused ring or the 
phenolic ether substrates. 

To test whether these two apparent trends in 
activity changes are supported by a quantitative 
examination of the data, we performed principal 
component analysis. Principal component analy- 
sis is a well-established mathematical technique for 
finding the dominant components of variation in a 
data set, essentially by diagonalizing the covariance 
matrix. As suggested by the foregoing visual inspec- 
tion, principal component analysis revealed that two 
components explained most of the changes in P450 
activity (Table [1]). The first component contained 
positive contributions from all six substrates, and so 
represents a general improvement in catalytic ability. 
The second component contained positive contribu- 
tions from the fused ring substrates and negative 



contributions from the phenolic ether substrates, 
and so represents an increased preference for the 
former class of substrates over the latter. Together, 
these two components explain 82% of the variance 
in activities among the 34 P450 variants. The re- 
maining 18% of the variance is explained by the four 
remaining components, which represent more subtle 
shifts in activity that are less easily rationalized with 
intuitive chemical arguments. 



Overall distributions of change in the activities 

The preceding sections have demonstrated that neu- 
tral genetic drift can lead to substantial changes in 
P450 activities, and that many of these changes can 
be understood as resulting from either fairly general 
increases/decreases in catalytic ability or shifts in 
preference for different broad classes of substrate 
structures. In this section, we examine whether 
there are any pervasive trends in the distributions 
of activity changes — for example, did most of the 
promiscuous activities tend to increase or decrease? 
If a property is not under any evolutionary con- 
straint, then during neutral genetic drift its values 
might be expected to be distributed in a roughly 
Gaussian fashion, as the neutrally evolving proteins 
freely sample from the presumably normal underly- 
ing distribution. On the other hand, if a property is 
constrained by selection to remain above a certain 
threshold, then during neutral genetic drift its val- 
ues should display a truncated distribution since se- 
lection culls proteins with values that fall below the 
threshold (such a distribution has been predicted for 
protein stability by simulations [28] and theory [21]). 

Figure |4] shows the distribution of changes in ac- 
tivity for each of the six substrates. The distribution 
for 12-pNCA appears to be truncated on the left, 
as expected since the P450s neutrally evolved under 
a requirement to maintain the ability to hydroxy- 
late 12-pNCA. Some of the P450s have undergone 
a mild decrease in 12-pNCA activity, reflective of 
the fact that the neutral evolution selection crite- 
rion provided a small amount of latitude by allow- 
ing the total amount of hydroxylated 12-pNCA to 
drop to 75% of the parental value [16]. A number 
of P450s have neutrally evolved 12-pNCA activity 
that modestly exceeds that of the parent — again 
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unsurprising, because the parental 12-pNCA activ- 
ity falls well below the maximal value achievable 
for this type of protein [27]. The distribution for 
1 1-phenoxyundecanoic acid resembles that for 12- 
pNCA, probably because activities on these two 
chemically similar substrates are highly linked, as 
discussed in the previous section. 

The other four promiscuous activities are less 
linked to 12-pNCA activity, and their distribu- 
tions are much more symmetric. The symmetric 
shapes of these distributions suggest that neu- 
tral genetic drift has sampled from a roughly 
Gaussian distribution for these four promiscu- 
ous activities. For three of the substrates (pro- 
pranolol, 2-amino-5-chlorobenzoxazole, and 1,2- 
methylenedioxybenzene) , the distributions of activi- 
ties are approximately centered around the parental 
activity. This centering indicates that the promis- 
cuous activities of the parent on these three sub- 
strates are typical of what would be expected of 
a neutrally evolved P450. The distribution for 2- 
phenoxyethanol, on the other hand, is shifted to- 
wards activities higher than that of the parent. This 
shift indicates that the parent is less active on 2- 
phenoxyethanol than a typical neutrally evolved 
P450. 



If the activity distributions of Figure 0] truly 
reflect what would be expected after a very long 
period of neutral genetic drift (i.e., if they are "equi- 
librium" distributions), then each variant represents 
a random sample from the underlying distribution 
of activities among all P450s that can neutrally 
evolve under this selection criterion. In this case, 
there should be no correlation between the extent 
of change in activity and the number of accumu- 
lated mutations, since the P450s should have lost 
all "memory" of the parent's activity. On the other 
hand, if there has not been enough neutral genetic 
drift to completely eliminate residual memory of the 
parent's activity, then variants with fewer mutations 
should more closely resemble the parent's activity 
profile. To test whether the activity distributions of 
the P450 variants had equilibrated, we computed the 
correlation between the magnitude of each variant's 
change in activity and the number of nonsynony- 
mous mutations it possessed relative to the parent. 
Table [2] shows that the magnitude of activity change 



is positively correlated with the number of mutations 
for all six substrates. Although the correlations for 
the individual substrates are mostly not statistically 
significant due to the small number of samples, the 
overall correlation for all six substrates is highly sig- 
nificant (P — 1CP 3 ). Therefore, the P450 activities 
are still in the process of diverging from the parental 
values by neutral genetic drift. If the variants were 
to undergo further neutral genetic drift, we would 
expect to see even larger changes in their promiscu- 
ous activities. 



We also examined whether P450 variants with 
mutations near the substrate binding pocket were 
more likely to have undergone large changes in their 
activities. Five of the P450 variants had a mutation 
to a residue that was within 5 A of the surrogate 
substrate in the P450 BM3 crystal structure [29]: 
variant M2 had A74V, M8 had A330V, M13 had 
M354I, M15 had A74P, and M24 had I263V [16]. 
Two of these mutated residues are of clear impor- 
tance, since mutating residue 74 has previously been 
shown to shift substrate specificity [25, 30, 31] and 
residue 263 plays a role in the substrate-induced 
conformational shift [32] . We compared the activity 
changes for the five variants with mutations near 
the binding pocket to those for the 29 variants with- 
out any such mutations, computing the magnitude 
of activity change as the absolute value of the log- 
arithm (base two) of the fold change in activity 
averaged over all six substrates. The average mag- 
nitude of activity change for the five variants with 
mutations near the active site was 0.88, while the 
average for the other 29 variants was 0.47. These 
averages are significantly different, with an unequal 
variance T-test P- value of 10 -2 . Therefore, variants 
with mutations near the substrate binding pocket 
are especially likely to have altered activities, al- 
though many variants without mutations near the 
pocket also underwent substantial activity changes. 



Conclusions 

We have shown that neutral genetic drift can lead 
to changes of as much as four-fold in the promis- 
cuous activities of P450 proteins. The ubiquity 
of these changes is striking — even though many 
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of the neutrally evolved P450s had only a hand- 
ful of mutations, most of them had experienced 
at least some change in their promiscuous activi- 
ties. P450s may be especially prone to this type 
of change, since their catalytic mechanism involves 
large substrate-induced conformational shifts [33] 
that can be modulated by mutations distant from 
the active site [30,34,35]. In addition, P450s have a 
tendency to eventually undergo irreversible inactiva- 
tion that can be promoted by reduced coupling be- 
tween substrate binding and conformational shifts, 
as well as by other poorly understood determinants 
of catalytic stability [19,36,37]. There are therefore 
ample opportunities for mutations that spread by 
neutral genetic drift to cause subtle alterations in a 
P450's promiscuous activities. But we believe that 
neutral genetic drift is also likely to cause substan- 
tial changes in the promiscuous activities of enzymes 
with other catalytic mechanisms. In support of this 
idea, a recent study by Tawfik and coworkers [38] 
indicates that mutations with little effect on the 
native lactonase activity of serum paraoxonase can 
alter this enzyme's promiscuous activities. Taken 
together, this study and our work suggest that neu- 
tral genetic drift allows for changes in promiscuous 
protein functions. These changes could in turn have 
important implications for future functional evolu- 
tion. For example, one can easily imagine a scenario 
in which neutral genetic drift enhances a promis- 
cuous protein function, and then a subsequent gene 
duplication allows natural selection to transform one 
of the genes into the template for a protein with a 
full-fledged new functional role [1-6]. 

One of the most attractive aspects of our study 
is the degree to which the changes in P450 activities 
during neutral genetic drift could be understood in 
terms of the chemical structures of the substrates. 
Neutral genetic drift did not simply cause unpre- 
dictable shifts in activities. Instead, most of the 
variation was explained by two eminently intuitive 
components: an overall increase or decrease in cat- 
alytic ability, and a preference for either fused ring 
or phenolic ether substrates. We have suggested that 
neutral genetic drift under a fixed selection criterion 
can be viewed as sampling underlying "equilibrium" 
distributions of activities. The distributions for dif- 
ferent activities are linked, since we have shown that 
P450s with good activity on one substrate will fre- 
quently also be highly active on chemically similar 



substrates (similar linkages have been observed in 
P450s created by recombination [39]). So while it 
may be impossible to know exactly how any specific 
mutation will affect a given activity, measuring a 
handful of activities allows one to make relatively 
accurate predictions about other closely linked activ- 
ities. The prerequisite for making such predictions 
is an understanding of the linkages among activities 
in the set of sequences explored by neutral genetic 
drift (the neutral network). We have made the 
first steps in elucidating these linkages for P450s 
that have neutrally evolved under one specific selec- 
tion regime. The linkages are very similar to those 
that would have been made by an organic chemist 
grouping the substrates on the basis of their chem- 
ical structures. Knowledge of these linkages is of 
use in understanding the origins of enzyme speci- 
ficity [10,40] — if an enzyme displays high activity 
on one substrate but low activity on another, then 
either these two activities are negatively linked dur- 
ing neutral genetic drift or selection has explicitly 
disfavored one of them. 



Our work also has implications for the general 
relationship between neutral genetic drift and adap- 
tive evolution. A number of studies focused on 
RNA [41-43] or computational systems [44,45] have 
suggested that genetic drift might aid in adaptive 
evolution. Our study and that of Tawfik and cowork- 
ers [38] support this notion for the evolution of new 
protein functions. However, the way that drift in 
promiscuous functions promotes adaptive evolution 
is slightly different than the paradigm proposed for 
RNA [41-43] and computational systems [44,45]. In 
those systems, neutral genetic drift is envisioned as 
allowing a sequence to move along its neutral net- 
work until it reaches a position where it can jump 
to a new higher-fitness and non-overlapping neutral 
network. In contrast, promiscuous protein functions 
change even as a protein drifts along a single neu- 
tral network. The adaptive benefits of this drift 
come when new selective pressures suddenly favor a 
previously irrelevant promiscuous function, in effect 
creating a new neutral network that overlaps with 
parts of the old one. 

Overall, experiments have now demonstrated 
two clear mechanisms by which neutral genetic drift 
can aid in the evolution of protein functions. In the 
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first mechanism, neutral genetic drift fixes a muta- 
tion that increases a protein's stability [20,21,46], 
thereby improving the protein's tolerance for subse- 
quent mutations [22-24], some of which may confer 
new or improved functions [24] . In the second mech- 
anism, which was the focus of this work and the 
recent study by Tawfik and coworkers [38], neu- 
tral genetic drift enhances a promiscuous protein 
function. This enhancement poises the protein to 
undergo adaptive evolution should a change in se- 
lection pressures make the promiscuous function 
beneficial at some point in the future. 



Methods 

Determination of P450 activities 

We attempted to determine the activities of all 44 
neutrally evolved P450 variants described in [16] 
(22 from the final monomorphic populations and 
22 from the final polymorphic population). Ten 
of these variants expressed relatively poorly in the 
procedure used here (as described in more detail 
below), and so were eliminated from further anal- 
ysis since their low expression led to large errors 
in the activity measurements. That left activity 
data for the 34 neutrally evolved P450 variants 
listed in Figures [2] and [3j as well as for the Ri- 
ll neutral evolution parent. The activities for 
each of these P450 variants were measured on all 
six substrates (12-pNCA, 2-phenoxyethanol, pro- 
pranolol, 11-phenoxyundecanoic acid, 2-amino-5- 
chlorobenzoxazole, and 1,2-methylenedioxybenzene). 
In all cases, the activities represent the total amount 
of product produced after two hours, and so are in 
units of total turnovers per enzyme. P450 BM3 
enzymes typically catalyze only a finite number of 
reaction cycles before becoming irreversibly inacti- 
vated, and we believe that all reactions were essen- 
tially complete after two hours, so these activities 
should represent the total turnovers of the enzymes 
during their catalytic lifetimes. 

To obtain P450 protein for the activity measure- 
ments, we expressed the protein using catalase-free 
Escherichia coli [47] containing the encoding gene on 
the isopropyl /3-D-thiogalactoside (IPTG) inducible 
pCWori [47] plasmid (the catalase is removed since 



it breaks down the hydrogen peroxide used by the 
P450). The sequences of the P450 variants are de- 
tailed in [16]. We used freshly streaked cells to 
inoculate 2 ml cultures of Luria Broth (LB) supple- 
mented with 100 /xg/ml of ampicillin, and grew these 
starter cultures overnight with shaking at 37°C. We 
then used 0.5 ml from these starter cultures to in- 
oculate 1 L flasks containing 200 ml of terrific broth 
(TB) supplemented with 100 /ig/ml of ampicillin. 
The TB cultures were grown at 30°C and 210 rpm 
until they reached an optical density at 600 nm of 
«0.9, at which point IPTG and (^-aminolevulinic 
acid were added to a final concentration of 0.5 mM 
each. The cultures were grown for an additional 19 
hours, then the cells were harvested by pelletting 50 
ml aliquots at 5,500 g and 4°C for 10 min, and stored 
at -20°C. To obtain clarified lysate, each pellet was 
resuspended in 8 ml of 100 mM [4-(2-hydroxyethyl)- 
1-piperazinepropanesulfonic acid] (EPPS), pH 8.2 
and lysed by sonication, while being kept on ice. 
The cell debris was pelleted by ccntrifugation at 
8,000 g and 4°C for 10 minutes, and the clarified 
lysate was decanted and kept on ice. 

To perform the assays, various dilutions of the 
clarified lysate were used to construct a standard 
curve. For each sample, we prepared dilutions of the 
clarified lysate in the 100 mM EPPS (pH 8.2) buffer 
to create samples for the standard curves. The di- 
lutions were 100% clarified lysate (undiluted), 67% 
lysate, 40% lysate, 25% lysate, 17% lysate, 10% 
lysate, 6.7% lysate, and 4.0% lysate. Similar dilu- 
tions were also prepared of the clarified lysate of E. 
coli cells carrying a null pCWori plasmid in order to 
assess the background readings from lysate without 
any P450. A pipetting robot was then used to dis- 
pense 80 fi\ of this series of clarified lysate dilutions 
into 96- well microtiter plates. Duplicate microtiter 
plates were then assayed for P450 concentration 
and total enzymatic activity on each of the six sub- 
strates. The Rl-11 parent was assayed four times 
rather than in duplicate. To minimize variation, all 
of these assays were performed in parallel, with the 
same stock solutions, and on the same day. 

The P450 concentration was determined using 
the carbon monoxide (CO) difference spectrum as- 
say [48]. Immediately before use, we prepared a 
5x stock solution of 50 mM sodium hydrosulfitc 
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in 1.3 M potassium phosphate, pH 8.0. A multi- 
channel pipette was used to add 20 jA of this stock 
solution to each well of the microtiter plates (which 
contained 80 /A of a dilution of clarified lysate), 
so that the final sodium hydrosulfite concentration 
was 10 mM in each well. The plates were briefly 
mixed and the absorbances were read at 450 and 
490 nm. The plates were then incubated in a CO 
binding oven [48] for 10 minutes to bind CO to the 
iron. The absorbance was then again read at 450 
and 490 nm. The amount of P450 is proportional to 
the increase in the magnitude of the absorbance at 
450 nm minus the absorbance at 490 nm. At each 
dilution along the standard curve, the reading for 
the null control (lysate dilutions without P450) was 
subtracted from the reading for each P450 variant to 
control for clarified lysate background. Additional 
file Q] shows the standard curves for all P450 vari- 
ants. Ten P450 variants had standard curve slopes 
less than or equal to 0.020, indicating a low P450 
concentration. These were the ten P450 variants 
that we discarded from further analysis, since the 
low P450 concentration decreased the accuracy of 
the measurements. 



To determine the activity on 12-pNCA, we mon- 
itored the formation of the yellow 4-nitrophenolate 
compound that is released upon hydroxylation of 
the twelfth carbon in the 12-pNCA molecule [27,49]. 
Immediately before use, we prepared a 6x stock so- 
lution of 12-pNCA by adding 3.6 parts of 4.17 mM 
12-pNCA in DMSO to 6.4 parts 100 mM EPPS, pH 
8.2. A multichannel pipette was used to add 20 /A 
of this stock solution to each well of the microtiter 
plates (which contained 80 /A of a dilution of clar- 
ified lysate). The plates were briefly mixed, and 
the absorbance was read at 398 nm. To initiate the 
reactions, we then prepared a 6x stock solution of 
24 mM hydrogen peroxide in 100 mM EPPS, pH 
8.2, and immediately added 20 (A of this solution 
to each well of the microtiter plate and mixed. The 
final assay conditions were therefore 6% DMSO, 250 
fiM 12-pNCA, and 4 mM hydrogen peroxide. The 
reactions were incubated on the benchtop for two 
hours, and the total amount of enzymatic product 
was quantified by the gain in absorbance at 398 nm. 
At each dilution along the standard curve, the corre- 
sponding null control lysate dilution was subtracted 
from the reading to control for lysate background. 
Additional file [T] shows the standard curves for all 



P450 variants. 



The activities on 2-phenoxyethanol, propra- 
nolol, 1 1-phenoxyundecanoic acid, 2-amino-5- 
chlorobenzoxazole, and 1,2-mcthylenedioxybenzene 
were determined using the 4-aminoantipyrene (4- 
AAP) assay [50,51], which detects the formation 
of phenolic compounds. For each of these five sub- 
strates, immediately before use we prepared a 6x 
substrate stock solution. These stock solutions were 
6% DMSO and 6% acetone in 100 mM EPPS, pH 
8.2, with an amount of substrate added so that 
the substrate concentrations in the stock solutions 
were: 150 mM for 2-phenoxyethanol, 30 mM for pro- 
pranolol, 5 mM for 11-phcnoxyundccanoic acid, 12 
mM for 2-amino-5-chlorobenzoxazole, and 120 mM 
for 1,2-methylenedioxybenzene. The stock solutions 
were prepared by first dissolving the substrate in 
the DMSO and acetone, and then adding the EPPS 
buffer. In some cases, the stock solution became 
cloudy upon addition of the buffer, but there was 
no immediate precipitation, so we could still pipette 
the stock solution. A multichannel pipette was used 
to add 20 /A of the appropriate substrate stock so- 
lution to each well of the microtiter plates (which 
contained 80 /A of a dilution of clarified lysate) . To 
initiate the reactions, we then added 20 /A of the 
freshly prepared 6x hydrogen peroxide stock solu- 
tion (24 mM hydrogen peroxide in 100 mM EPPS, 
pH 8.2) and mixed. We incubated the plates on the 
benchtop for two hours. To detect the formation 
of phenolic products, a pipetting robot was used to 
add and mix 120 /d of quench buffer (4 M urea in 
100 mM sodium hydroxide) to each well. We then 
used the robot to add and mix 36 /A per well of 
0.6% (w/v) of 4-aminoantipyrene in distilled wa- 
ter, and immediately read the absorbance at 500 
nm. To catalyze formation of the red compound 
produced by coupling a phenolic compound to 4- 
aminoantipyrene [50,51], we then used the pipetting 
robot to add and mix 36 /A per well of 0.6% (w/v) 
of potassium peroxodisulfate in distilled water. The 
plates were incubated on the benchtop for 30 min- 
utes, and the amount of product was quantified by 
the gain in absorbance at 500 nm. At each dilu- 
tion along the standard curve, the corresponding 
null control lysate dilution was subtracted from the 
reading to control for lysate background. Additional 
file Q] show the standard curves for all P450 variants. 
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In order to extract enzymatic activities from the 
standard curves, we fit lines to the data points. For 
some of the substrates (most notably 12-pNCA and 
2-phenoxyethanol), many of the P450 variants were 
sufficiently active to either saturate the substrate 
or exceed the linear range of absorbance readings. 
Therefore, we examined each standard curve by eye 
to determine which points remained in the linear 
range. Lines were then fit to the points in the linear 
range. These fits are shown in Additional file Q] In 
the plots in this file, all points that were deemed 
to fall in the linear range (and so were used for the 
fits) are shown as filled shapes, while all points that 
were deemed to fall outside the linear range (and 
so were not used in the fits) are shown as empty 
shapes. The figures show the slopes of the lines for 
all replicates (two replicates for all P450 variants 
except for Rl-11, which had four replicates). These 
slopes are averaged for a best estimate of the slope, 
and the standard error computed over these two 
measurements is also reported. 

To compare the activities (total substrate 
turnovers per enzyme) among the different P450 
variants, it is first necessary to normalize to the en- 
zyme concentration. To do this, we took the ratio of 
the slope for each substrate divided by the slope of 
the CO different spectrum, propagating the errors. 
These normalized slopes are proportional to the ac- 
tivity on each substrate. The normalized slopes 
are given in Additional file [2j This file also lists 
the number of nonsynonymous mutations that each 
P450 variant possesses relative to the Rl-11 par- 
ent sequence, as originally reported in [16]. These 
normalized slopes allow for accurate comparisons 
among the P450 variants, and were used in the 
analyses in this paper. To convert these normalized 
slopes into total substrate turnovers per enzyme, it 
is necessary to multiply them by the ratio of extinc- 
tion coefficients. The extinction coefficient for the 
CO difference spectrum reading (the absorbance at 
450 nm minus that at 490 nm) is 91 mM" 1 ^ 1 [48], 
and we calculated the extinction coefficient at 398 
nm for the 4-nitrophenolate group in our buffer to be 
12,000 M -1 cm- 1 . Therefore, for 12-pNCA, the to- 
tal number of substrate turnovers per P450 enzyme 
is 7.58 times the ratio of the 12-pNCA standard 
curve slope to the CO difference spectrum slope. 
This indicates that our parent protein had about 



250 12-pNCA turnovers per enzyme, compared to 
the 1,000 reported for a variant engineered for maxi- 
mal 12-pNCA activity [27]. For the other substrates 
assayed with the 4-AAP assay, the extinction coeffi- 
cient at 500 nm for the 4-AAP/phenol complex has 
been reported to be 4,800 [51]. However, we believe 
that this extinction coefficient could be of dubious 
accuracy for our data. Depending on the exact type 
of phenolic compound created by P450 hydroxyla- 
tion, the extinction coefficient for the 4-AAP /phenol 
complex may vary. Assuming the extinction coeffi- 
cient of 4,800 M _1 cm~ 1 is accurate, then the total 
number of substrate turnovers per P450 enzyme is 
19.0 times the ratio of the substrate standard curve 
slope to the CO difference spectrum slope. Using 
this coefficient, the parent P450 had roughly 1,000 
turnovers on 2-phenoxyethanol, 30 turnovers on pro- 
pranolol, 400 turnovers on 1 1-phenoxyundecanoic 
acid, 50 turnovers on 2-amino-5-chlorobenzoxazole, 
and 80 turnovers on 1,2-methylenedioxybenzcne. 
The high activities on 2-phcnoxycthanol and 11- 
phenoxyundecanoic acid are presumably due to the 
fact that lack of polar substituents on the aromatic 
ring allows these compounds to enter the hydropho- 
bic P450 BM3 binding pocket [29] more easily than 
12-pNCA. However, we emphasize that the exact 
numerical values for the turnovers for these five sub- 
strates are questionable. Definitive determination 
of the extinction coefficients would require analyti- 
cal analysis of the enzymatic products for each P450 
variant on each substrate, which is beyond the scope 
of this study. 



Analysis of activity data 

The raw activity values computed for the P450 vari- 
ants are listed in Additional file O To analyze and 
display this data, we computed the fold change in 
activity of each variant relative to the Rl-11 parent 
P450. The fold change is simply the variant activity 
divided by the parent activity on each substrate, 
with the standard errors propagated to give an er- 
ror on the fold change. In Figures [2] and [3j these 
fold changes are displayed on a logarithmic scale so 
that each unit corresponds to a two-fold increase 
or decrease in activity. In Figure [H the substrates 
and the P450 variants have both been clustered, as 
shown by dendrograms on the side of the heat map. 
The clustering was performed using the standard 
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hierarchical clustering function of the R statistical 
package. This is complete linkage hierarchical clus- 
tering, with the distances computed as the Euclidian 
distance between the logarithms of the fold changes 
in activity. The standard errors on the fold changes 
in activity are not incorporated into Figure [2] or any 
of the related analysis. However, these standard er- 
rors are shown in Figure El it is apparent from this 
figure that the errors tend to be much less than the 
fold changes in activity themselves. 

In Figure 21 the histogram bins are logarithmi- 
cally spaced so that each bin contains a 2°- 5 -fold 
range of activities. For example, the histogram bin 
centered at one contains all variants with between 
2-0.25 = o.g4 and 2 25 = 1.19 fold the parental 
activity, while the bin centered at 1.5 contains all 
variants with between 2 025 = 1.19 and 2 75 = 1.68 
fold the parental activity. 

The principal component analysis shown in Ta- 
ble [T] was performed using the R statistical package, 
with inputs being the logarithms of the fold changes 
in activity. Since these log fold changes in activity 
contained no arbitrary units (they were already nor- 
malized to the parent), the data was neither scaled 
nor zeroed before performing the analysis. Table [1] 
shows the composition and the percent of variance 
explained (the eigenvalue for that component di- 
vided by the sum of all eigenvalues) for the first two 
components. The remaining four components were 
relatively unimportant, explaining 7%, 5%, 4%, and 
2% of the total variance. 



Phylogenetic tree 

The phylogenetic tree shown in Figure [T] is based on 
the number of nonsynonymous mutations the P450 
variants have relative to the R-ll neutral evolution 
parent, as reported in [16]. Each of the P450s that 
evolved in a monomorphic population (prefix of M) 
are known to have diverged independently, and so 
are drawn on their own branch regardless of any se- 
quence identity to other variants. The exact phylo- 
genetic relationship of the P450s that evolved in the 
polymorphic population (prefix of P) is not known, 
so they portion of the tree for these mutants was 



reconstructed by maximum parsimony. The tree is 
based only on the nonsynonymous mutations, and 
all mutations weighted equally. Full nucleotide and 
amino acid sequences of the P450s can be found 
in [16]. 
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Figure 1 - Phylogenetic tree of the neutrally evolved P450s 

The tree shows the relationship among the 34 neutrally evolved P450 variants examined in this study. All 
of the P450s neutrally evolved from the same Rl-11 parent P450. The horizontal lengths of the branches 
are proportional to the number of nonsynonymous mutations, as indicated by the scale bar. The vertical 
arrangement of the branches is arbitrary. 
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Figure 2 - Activities of the neutrally evolved P450s on the six substrates. 

The heat map shows the fold change in activity of all 34 neutrally evolved P450 variant on all six substrates. 
Each row shows the data for a different P450 variant, while each column shows the activity on a different 
substrate. The fold change in activity is the ratio of the variant's activity to that of the neutral evolution 
parent. Both the substrates and the P450 variants are hierarchically clustered according to the activity 
profiles, as shown by the dendrograms at top and left. Substrate abbreviations: PROP - propranolol, 2A5C 
- 2-amino-5-chlorobenzoxazole, MDOB - 1,2-methylenedioxybenzene, 2PE - 2-phenoxyethanol, 11PA - 11- 
phcnoxyundecanoic acid. The standard errors for the changes in activity displayed in the heat map tend to 
be much smaller than the changes themselves; these errors are shown explicitly in Figure [3J 
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Figure 3 - Fold changes in P450 activities with standard errors. 

The bar graphs show the fold change in activity of all 34 neutrally evolved P450 variants on all six substrates. 
This is the same data as in Figure [H except these graphs also give error bars showing the standard errors in 
two separate measurements of the activities. In most cases the standard errors are much smaller than the 
activity changes themselves. 
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Figure 4 - Distributions of activity changes on each of the six substrates. 

The histograms show the distributions of fold changes in activity for all 34 neutrally evolved P450 variants 
on each of the six substrates, with a value of one indicating that the activity is the same as the neutral 
evolution parent. 



16 



Tables 

Table 1 - Principal component analysis of activity profiles. 

The first two principal components explain 82% of the variance in P450 activity profiles. The table shows 
the composition of these two components and the variance explained by each. The first component contains 
positive contributions from all substrates and can be thought of as representing a general high catalytic 
ability. The second component can be thought of as representing discrimination between fused ring substrates 
(PROP, 2A5C, and MDOB) and phenolic either substrates (12-pNCA, 11PA, 2PE). Substrate abbreviations 
are as defined in the legend to Figure O The analysis was performed on the logarithms of the fold changes 
in activity. 
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62% of variance) 
20% of variance) 
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-0.39 
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0.14 
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0.70 


0.51 
0.19 
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Table 2 - Correlations between changes in activity and number of mutations. 

The extent of change in activity is positively correlated with the number of nonsynonymous mutations the 
P450 has undergone relative to the neutral evolution parent. Each column shows the Pearson correlation 
between the number of nonsynonymous mutations and the absolute value of the logarithm of the fold change 
in activity for a different substrate, computed over all 34 P450 variants. The final column (ALL) is the 
correlation among the 6 x 34 pooled data points for all six substrates. The P-values are shown in parentheses; 
none of the correlations for the individual substrates are significant at a 1% level (due to the small number 
of data points), but the overall correlation for all substrates is highly significant. Substrate abbreviations 
are as defined in the legend to Figure [21 
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Additional Files 

Additional file 1 — Standard curves used to determine P450 activities. 

The PDF file shows all of the standard curves used to determine the P450 concentration and enzymatic 
activities. Points that were deemed to fall in the linear range, and so used to compute the standard curve 
slopes, are solid. Points that were deemed outside of the linear range are empty. Each curve shows the 
slopes computed for two independent replicates, and the average slope with standard error. These average 
slopes were used to compute the P450 activities. 

Additional file 2 — Raw activity and sequence data. 

The text file gives the activities of the P450 variants on each of the six substrates, as measured in this study. 
It also lists the number of nonsynonymous mutations (M_NS) relative to the Rl-11 neutral evolution parent 
sequence, as originally reported in [16]. Each row lists the data for a different P450 variant, and standard 
errors for the measured activities are shown in parentheses. Activities are the normalized standard curve 
slopes as described in the Methods. 
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